2 research outputs found

    Localized convolutional neural networks for geospatial wind forecasting

    Full text link
    Convolutional Neural Networks (CNN) possess many positive qualities when it comes to spatial raster data. Translation invariance enables CNNs to detect features regardless of their position in the scene. However, in some domains, like geospatial, not all locations are exactly equal. In this work, we propose localized convolutional neural networks that enable convolutional architectures to learn local features in addition to the global ones. We investigate their instantiations in the form of learnable inputs, local weights, and a more general form. They can be added to any convolutional layers, easily end-to-end trained, introduce minimal additional complexity, and let CNNs retain most of their benefits to the extent that they are needed. In this work we address spatio-temporal prediction: test the effectiveness of our methods on a synthetic benchmark dataset and tackle three real-world wind prediction datasets. For one of them, we propose a method to spatially order the unordered data. We compare the recent state-of-the-art spatio-temporal prediction models on the same data. Models that use convolutional layers can be and are extended with our localizations. In all these cases our extensions improve the results, and thus often the state-of-the-art. We share all the code at a public repository

    Optimization Techniques for Hestenes-Jacobi SVD on FPGAs

    No full text
    Matrix decomposition, such as the Singular Value Decomposition (SVD) is an important compute-intensive task in a wide variety of fields, from radar and simulation to image processing and compression. In light of growing data sizes, accelerators, such as FPGAs, are often considered for SVD, with the goal of increasing the compute efficiency. However, to achieve high-performance computation of SVD, we need high parallelism. For that, a thorough reevaluation of the complexities involved in implementing the algorithm is necessary in light of hardware and algorithm advances. In this work, we investigate the Hestenes-Jacobi SVD (HJSVD) on FPGAs. Our findings show that the Hestenes-Jacobi method, while highly parallelizable, can become constrained in its hardware resource costs and requires careful tuning to achieve high throughput with input matrix sizes and degree of parallelism having an important effect on efficient pipelining of the architecture. We identify the key challenges in parallelizing the algorithm for modern workloads and incorporate three key optimizations: pipelining, the use of fixed-point arithmetic instead of floating-point, and the use of heterogeneous resources for vector rotations, not only DSPs
    corecore